FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems
نویسندگان
چکیده
In this paper, a methodological data condensation approach for reducing tabular big datasets in classification problems is presented, named FDR2-BD. The key of our proposal to analyze dual way (vertical and horizontal), so as provide smart combination between feature selection generate dense clusters uniform sampling reduction keep only few representative samples from each problem area. Its main advantage allowing the model’s predictive quality be kept range determined by user’s threshold. robustness built on hyper-parametrization process, which all are taken into consideration following k-fold procedure. Another significant capability being fast scalable using fully optimized parallel operations provided Apache Spark. An extensive experimental study performed over 25 with different characteristics. most cases, obtained percentages above 95%, thus outperforming state-of-the-art solutions such FCNN_MR that barely reach 70%. promising outcome maintaining representativeness original information, prediction values around 1% baseline.
منابع مشابه
A Distributed Recommendation Platform for Big Data
The vast amount of information that recommenders manage these days has reached a point where scalability has become a critical factor. In this work, we propose a scalable architecture designed for computing Collaborative Filtering recommendations in a Big Data scenario. In order to build a highly scalable and fault-tolerant platform, we employ fully distributed systems without any single point ...
متن کاملA Fuzzy TOPSIS Approach for Big Data Analytics Platform Selection
Big data sizes are constantly increasing. Big data analytics is where advanced analytic techniques are applied on big data sets. Analytics based on large data samples reveals and leverages business change. The popularity of big data analytics platforms, which are often available as open-source, has not remained unnoticed by big companies. Google uses MapReduce for PageRank and inverted indexes....
متن کاملCorrect classification for big/smart/fast data machine learning
Table (database) / Relational database Classification for big/smart/fast data machine learning is one of the most important tasks of predictive analytics and extracting valuable information from data. It is core applied technique for what now understood under data science and/or artificial intelligence. Widely used Decision Tree (Random Forest) and rare used rule based PRISM , VFST, etc classif...
متن کاملFuzzy Data Envelopment Analysis for Classification of Streaming Data
The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...
متن کاملFuzzy Data Envelopment Analysis for Classification of Streaming Data
The classification of fuzzy uncertain data is considered one of the most challenging issues in data analysis. In spite of the significance of fuzzy data in mathematical programming, the development of the analytical methods of fuzzy data is slow. Therefore, the current study proposes a new fuzzy data classification method based on fuzzy data envelopment analysis (DEA) which can handle strea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Electronics
سال: 2021
ISSN: ['2079-9292']
DOI: https://doi.org/10.3390/electronics10151757